book_Liu

Author: Zhen (Leo) Liu

This textbook presents basic knowledge and essential toolsets needed for people who want to step into artificial intelligence (AI). The book is especially suitable for those college students, graduate students, instructors, and IT hobbyists who have an engineering mindset. That is, it serves the idea of getting the job done quickly and neatly with an adequate understanding of why and how. It is designed to allow one to obtain a big picture for both AI and essential AI topics within the shortest amount of time.
Google Book   

1 Preface


2 Introduction to Artificial Intelligence

----2.1 Overview
----2.2 Introduction to Artificial Intelligence
--------2.2.1 Why look into AI?
--------2.2.2 What is AI?
--------2.2.3 History of AI
------------Birth (1952 and 1956)
------------Symbolic AI (1956-1974)
------------First AI Winter (1974-1980)
------------Expert System and Connectionism Bloom (1980-1987)
------------Second AI Winter (1987-1993)
------------Recovery (1993-2011)
------------Deep Learning and Big Data Rise (2011-present)
--------2.2.4 AI vs. Traditional Engineering Methods
------------Practice: Prediction of Object Flying Trajectory (Physics Methods vs. Data Method)
--------2.2.5 AI Applications
------------AI Applications in All Sectors
------------AI Applications in Engineering
----2.3 Basics of AI
--------2.3.1 Basic Concepts
------------Key Machine Learning Elements
------------Data Format
------------Machine Learning Workflow
--------2.3.2 Common Algorithms
------------Overview and Machine Learning Tasks
------------Supervised Learning
------------Unsupervised Learning
------------Reinforcement Learning
------------Semi-Supervised Learning
------------Summary
--------2.3.3 Challenges and Issues in Machine Learning
------------Data Issues
------------Inductive Bias
------------Underfitting and Overfitting
----2.4 Practice: Gain First Experience with AI via a Machine Learning Task

3 Tools for Artificial Intelligence

----3.1 Overview of Tools for AI
----3.2 Python
--------3.2.1 Introduction to Python Coding Environment
--------3.2.2 Basics
--------3.2.3 Variables and Datatypes
--------3.2.4 Operators
--------3.2.5 Conditional Control Statements
--------3.2.6 Sequential Control Statements
--------3.2.7 Functions
--------3.2.8 Input and Output
--------3.2.9 Advanced Python Functionality
----3.3 Data Manipulation and Visualization
--------3.3.1 NumPy
------------NumPy Array
------------Array Constructions
------------Array Operations
--------3.3.2 Pandas
------------From NumPy to Pandas
------------Series
------------Dataframe
--------3.3.3 Matplotlib
------------Pyplot: Procedural Plotting Interface
------------Object-Oriented Plotting Interface
----3.4 General Machine Learning
--------3.4.1 Scikit-learn
------------Data Import
------------Data Preprocessing
------------Using Models
------------Saving Models
----3.5 Deep Learning
--------3.5.1 Deep Learning Frameworks
--------3.5.2 TensorFlow
------------Overview of APIs
------------Computational Graph
------------Variables
------------Placeholders and Comprehensive Example
------------Comprehensive Example
--------3.5.3 Keras
------------Installation and Data Preparation
------------Model Establishment with the Sequential API
------------Model Establishment with the Functional API
------------Training and Result Visualization
----3.6 Reinforcement Learning
--------3.6.1 Overview of RL Tools
--------3.6.2 OpenAI Gym
----3.7 Practice: Use, Compare, and Understand TensorFlow and Keras for Problem Solving

4 Linear Models

----4.1 Overview
----4.2 Basics of Linear Models
--------4.2.1 Simple Explanation of Linear Models
--------4.2.2 General Formulation of Basic Linear Model
----4.3 Other Linear Regression Algorithms
--------4.3.1 Ridge
--------4.3.2 Lasso
----4.4 Logistic Regression for Classification
--------4.4.1 Binary Classification
--------4.4.2 Multiclass Classification
----4.5 Making Linear Models Nonlinear via Kernel Functions
--------4.5.1 Mapping Data to Higher Dimensional Space with Stretching Functions
--------4.5.2 Kernel Functions
----4.6 Practice: Develop Code to Implement the Basic Linear Model

5 Decision Trees

----5.1 Overview
----5.2 Basics of Decision Trees
----5.3 Classic Decision Tree Algorithms
--------5.3.1 ID3 Algorithm
--------5.3.2 C4.5 Algorithm
--------5.3.3 CART Algorithm
--------5.3.4 Implementation
----5.4 Issues and Techniques: Overfitting and Pruning
--------5.4.1 Pre-pruning
--------5.4.2 Post-pruning
------------Cost-Complexity Pruning (CCP)
------------Reduced Error Pruning (REP)
------------Pessimistic Error Pruning (PEP)
------------Minimum Error Pruning (MEP)
------------Comparison and Summary
----5.5 Practice: Decision Trees in Scikit-learn: Training, Tree Plot, and Testing

6 Support Vector Machines

----6.1 Overview
----6.2 Basics of SVM: Hard Margin SVM
--------6.2.1 Basic Formulation
--------6.2.2 Dual Formulation
----6.3 Generalization of SVM: Kernel Methods
----6.4 Soft Margin SVM
--------6.4.1 Basic Formulation
--------6.4.2 Dual Formulation
----6.5 More about SVM
--------6.5.1 SMO Algorithm
--------6.5.2 SVM for Multiclass Classification and Regression
----6.6 Practice: Use of SVMs in Scikit-learn for Classification and Regression

7 Bayesian Algorithms

----7.1 Overview
----7.2 Statistics Background for Machine Learning
--------7.2.1 Statistics and Machine Learning
--------7.2.2 Frequentists and Bayesians
--------7.2.3 Overview of Statistical Inference
--------7.2.4 Maximum Likelihood Estimation (MLE)
--------7.2.5 Bayesian Estimation
----7.3 Parametric Bayesian Methods
--------7.3.1 Naive Bayes Classifier
--------7.3.2 Semi-Naive Bayesian Classifier
------------One-Dependent Estimator (ODE)
------------Variations of ODE
------------Tree Augmented naive Bayes (TAN)
--------7.3.3 Bayesian Network
------------Structure
------------Implementation
----7.4 Bayesian Nonparametrics
--------7.4.1 Parametric vs. Nonparametric Models
------------Overview
------------Parametric Models
------------Nonparametric Models
------------From Parametric to Nonparametric Bayesian Algorithms
--------7.4.2 Gaussian Processes
------------Introduction to Gaussian Process
------------Modeling Functions using Multivariate Gaussian
------------Making Predictions using a Prior and Observations
------------Example
------------Summary
----7.5 Practice: Code Gaussian Naive Bayes Classifier, Try Bayesian Network, and Apply Gaussian Process

8 Artificial Neural Networks

----8.1 Overview
----8.2 Basics of Artificial Neural Networks
--------8.2.1 From Biological Neural Network to ANN
--------8.2.2 Activation Function
--------8.2.3 Perceptron
--------8.2.4 Multiple Layer Feedforward Neural Network
----8.3 Training with Backpropagation
--------8.3.1 Concepts
--------8.3.2 Backpropagation in a 3-Layer Network
--------8.3.3 Backpropagation in Neural Networks with 3+ Layers
----8.4 Implementation
--------8.4.1 Practical Skills
--------8.4.2 Procedure for An Example
--------8.4.3 *Shape and Arrangement of Arrays for Data
----8.5 Other ANN Issues
----8.6 Practice: Modify and Assess the Architecture of an ANN

9 Deep Learning

----9.1 From Artificial Neural Networks to Deep Learning
--------9.1.1 Overview
--------9.1.2 The First Wave
--------9.1.3 The Second Wave
--------9.1.4 The Third Wave
--------9.1.5 Summary of Enabling Innovations
----9.2 Convolutional Neural Network
--------9.2.1 Convolution
------------Forward Pass
------------Backward Pass
------------Padding and Stride
--------9.2.2 ReLU
--------9.2.3 Pooling
----9.3 Recurrent Neural Network
--------9.3.1 Forward Pass
--------9.3.2 Backward Pass
----9.4 Practical Deep Learning Skills
--------9.4.1 Initialization
------------Overview
------------Xavier Initialization
------------He Initialization
------------LeCun Initialization
------------Batch Normalization
--------9.4.2 Optimization Methods
------------SGD
------------Momentum
------------Nesterov
------------AdaGrad
------------AdaDelta
------------RMSprop
------------Adam
------------Nadam
--------9.4.3 Data Preprocessing and Augmentation
----9.5 Practice: Build AlexNet using Keras to Address MNIST Image Classification

10 Ensemble Learning

----10.1 Overview
----10.2 Basics of Ensemble Learning
--------10.2.1 Definition
--------10.2.2 Basic Questions
--------10.2.3 Categories of Ensemble Learning Methods
--------10.2.4 Essence of Ensemble Learning
--------10.2.5 History and Challenge
----10.3 Bagging
--------10.3.1 Basic Bagging
--------10.3.2 Random Forest
----10.4 Boosting
--------10.4.1 AdaBoost
------------Loss Function
------------Update on Model Weights
------------Update on Sample Weights/Distribution
------------Pseudo-Code
--------10.4.2 Gradient Boosting
----10.5 Stacking
----10.6 Practice: Code and Evaluate Ensemble Learning Methods

11 Clustering

----11.1 Overview
----11.2 Basics of Unsupervised Learning
--------11.2.1 From Supervised Learning to Unsupervised Learning
--------11.2.2 Framework for Unsupervised Learning
--------11.2.3 Overview of Clustering
----11.3 K-Means Clustering
--------11.3.1 Math Framework of K-Means Algorithm
--------11.3.2 Implementation of K-Means
--------11.3.3 Initialization
--------11.3.4 Selection of K
--------11.3.5 Pros and Cons
----11.4 Mean-Shift Clustering Algorithm
--------11.4.1 Pros and Cons
----11.5 Density-Based Spatial Clustering (DBScan)
--------11.5.1 Pros and Cons
----11.6 Gaussian Mixture Models (GMM)
--------11.6.1 Pros and Cons
----11.7 Hierarchical Agglomerative Clustering (HAC)
--------11.7.1 Pros and Cons
----11.8 Evaluation of Clustering
--------11.8.1 Overview of Evaluation Metrics
--------11.8.2 Internal Evaluation
------------Silhouette Coefficient
------------Davies-Bouldin Index
------------Dunn Index
--------11.8.3 External Evaluation
------------Rand Index
------------Adjusted Rand Index
------------Normalized Mutual Information (NMI)
------------Fowlkes-Mallows Index
------------Contingency Matrix
----11.9 Practice: Test and Modify Clustering Code for Problem Solving

12 Dimension Reduction

----12.1 Overview
----12.2 Basics of Dimension Reduction
--------12.2.1 Concepts and Needs
--------12.2.2 Popular Methods and Classification
----12.3 Common Feature Selection Methods
----12.4 Feature Extraction Method 1: Principal Components Analysis
--------12.4.1 Concept and Main Idea
--------12.4.2 Theoretical Basis
------------Deduction based on Minimum Distance
------------Deduction based on Maximum Variance
--------12.4.3 Implementation
----12.5 Feature Extraction Method 2: Linear Discriminant Analysis
--------12.5.1 Concept and Main Idea
--------12.5.2 Theoretical Basis
------------Rayleigh Quotient and Generalized Rayleigh Quotient
------------Binary Classification
------------Multiclass Classification
--------12.5.3 Implementation
----12.6 Practice: Develop and Modify Code for PCA and LDA

13 Anomaly Detection

----13.1 Overview
----13.2 Basics of Anomaly Detection
----13.3 Statistics-Based Methods
--------13.3.1 3 Sigma
--------13.3.2 Z-score
--------13.3.3 Boxplot
--------13.3.4 Grubbs Hypothesis Test
----13.4 Supervised Learning Methods
--------13.4.1 Why Not Use Binary Classification for Anomaly Detection?
--------13.4.2 Modification of Supervised Classification Methods for Anomaly Detection
----13.5 Unsupervised Machine Learning Methods
--------13.5.1 Overview
--------13.5.2 Probabilistic Distribution-Based: HBOS
--------13.5.3 Distance-Based: KNN
--------13.5.4 Density-Based: LOF, COF, INFLO and LoOP
------------Local Outlier Factor (LOF)
------------Connectivity-based Outlier Factor (COF):
------------Influenced Outlierness (INFLO)
------------Local Outlier Probability (LoOP)
--------13.5.5 Clustering-Based
--------13.5.6 Tree-Based
------------iForest
------------SCiForest
------------RRCF
------------Pros and Cons
----13.6 Semi-Supervised Learning Methods
--------13.6.1 Overview
--------13.6.2 Autoencoder
------------Introduction
------------Preparation: Packages and Data
------------Model Establishment
------------Training and Prediction
------------Results Evaluation and Visualization
----13.7 Anomaly Detection Issues
--------13.7.1 Data Quality
--------13.7.2 Imbalanced distributions
--------13.7.3 High-Dimensional Data
--------13.7.4 Model Sensitivity
----13.8 Practice: Implement Typical Anomaly Detection Methods

14 Association Rule Learning 233

----14.1 Overview
----14.2 Basics of Association Rule Learning
--------14.2.1 Definition
--------14.2.2 Relationships with Other Machine Learning Topics
--------14.2.3 Understanding via History
----14.3 Essential Concepts of Association Rules
--------14.3.1 Items, Itemsets, and Rules
--------14.3.2 Support, Confidence, and Lift
--------14.3.3 Association Rule Analysis using the Concepts
----14.4 Apriori
--------14.4.1 Procedure
--------14.4.2 Implementation with an Example
--------14.4.3 Pros and Cons
----14.5 FP Growth
--------14.5.1 Procedure
--------14.5.2 Item Header Table
--------14.5.3 FP Tree
--------14.5.4 Mining FP Tree for Frequent Itemsets
----14.6 Eclat
--------14.6.1 Procedure
--------14.6.2 Implementation
----14.7 Practice: Perform Association Rule Learning with Eclat

15 Value-Based Reinforcement Learning

----15.1 Overview
----15.2 Basics of Reinforcement Learning
--------15.2.1 Basic Concepts
--------15.2.2 Markov Decision Process
--------15.2.3 Policy Function, State Function, State-Action Function, and Reward Function
--------15.2.4 Implementation of RL Environment
------------Implementation with OpenAI Gym
------------Implementation from Scratch
----15.3 Bellman Equation
--------15.3.1 Formulations of Bellman Equation
--------15.3.2 Deduction of Bellman Equation
--------15.3.3 Use of Bellman Equation in Reinforcement Learning
----15.4 Value-Based RL
--------15.4.1 Overview of RL Algorithms
--------15.4.2 Q Learning and Sarsa
--------15.4.3 Monte Carlo Method
----15.5 Practice: Solve RL Problem using Q Learning

16 Policy-Based Reinforcement Learning

----16.1 Overview
----16.2 Policy-Based RL vs. Value-Based RL
----16.3 Basic Concepts
----16.4 Objective Function and Policy Gradient Theorem
--------16.4.1 Objective Function
--------16.4.2 Policy Gradient Theorem
--------16.4.3 Simple Episodic Monte Carlo Implementation of Policy Gradient: REINFORCE V1
--------16.4.4 Strategies for Improving Policy Gradient Implementation
----16.5 Policy Function
--------16.5.1 Linear Policy Function for Discrete Actions: Formulation 1
--------16.5.2 Linear Policy Function for Discrete Actions: Formulation 2
--------16.5.3 Policy Function for Continuous Actions
----16.6 Common Policy Gradient Algorithms
--------16.6.1 More Objective Function Formulations
--------16.6.2 Simple Stepwise Monte Carlo Implementation of Policy Gradient: REINFORCE V2
--------16.6.3 Actor-Critic
--------16.6.4 Actor-Critic with Baseline
--------16.6.5 More Policy Gradient Algorithms
----16.7 Practice: Understand and Modify Policy Gradient Code for Addressing RL Problem

17 Appendices

----17.1 Overview
----17.2 Mathematics for Machine Learning
--------17.2.1 Statistics
------------Random Variables
------------Probabilities
------------Use of Probability in Machine Learning
------------Probability Distributions
--------17.2.2 Information Theory
--------17.2.3 Array Operations
------------Matrix Operations
------------General Array Operations
------------Array Calculus
----17.3 Optimization
--------17.3.1 Gradient-Based Methods
--------17.3.2 Newton’s Method and Quasi-Newton Methods
--------17.3.3 Conjugate Gradient Methods
--------17.3.4 Expectation Maximization Methods
----17.4 Evaluation Metrics
------------17.4.1 Overview and Basics
--------17.4.2 Classification: Binary
------------Confusion Matrix
------------ROC and AUC
------------Logarithmic Loss
--------17.4.3 Classification: Multi-Class
------------Indirect Methods
------------Confusion Matrix
------------Logarithmic Loss
------------Kappa Coefficient
------------Hinge Loss
--------17.4.4 Classification: Multi-Label
------------Hamming Distance
------------Jaccard Similarity Coefficient
--------17.4.5 Regression
------------Root Mean Squared Error
------------Mean Absolute Error
------------Mean Squared Error
------------Root Mean Squared Logarithmic Error
------------R2 and Adjusted R2
--------17.4.6 Clustering
------------Inertia and Dunn Index
------------Silhouette
------------Davies-Bouldin Index
------------Calinski-Harabasz Index
------------Adjusted Rand Index
------------Adjusted Mutual Information

Enjoy and Build the AI World

Sample Code from AI Engineering

Cite the code in your publications

Linear Models